Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.

نویسندگان

  • Elizabeth M Blue
  • Lei Sun
  • Nathan L Tintle
  • Ellen M Wijsman
چکیده

When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complex segregation analysis of obsessive-compulsive disorder in families with pediatric probands.

OBJECTIVE The purpose of this study was to assess the mode of inheritance for obsessive-compulsive disorder (OCD) in families ascertained through pediatric probands. METHODS We ascertained 52 families (35 case and 17 control families) through probands between the ages of 10 and 17 years. Direct interviews were completed with 215 individuals. Family informant data were collected on another 450...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

Evaluation the Genetic Diversity and Transgressive Segregation for Yield and Yield Components of F6 Linseed (Linum usitatissimum L.) Lines Derived from KO37 × CAN1066 Cross

   In order to investigate the genetic variation and transgressive segregation of some agronomic traits in F6 linseed breeding lines derived from KO37 × CAN1066 cross, 824 breeding lines (selected from 106 F5 lines) evaluated using an augmented design along with five-control genotypes. The highest genotypic coefficient of variation (GCV) were observed for seed weight / capsule (53.39%), whereas...

متن کامل

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

Genetic and genomic discovery using family studies.

Genetic studies traditionally have been performed on sets of related individuals, that is, families. Mendel’s early studies in sweet peas (Pisum sativum) on the inheritance patterns of discrete traits from parents with specific mating types to offspring has shed light on the basic mechanisms of inheritance, including the fundamental laws of segregation of discrete factors (genes) from parents t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genetic epidemiology

دوره 38 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2014